Data Visualization

Cory Whitney

Data visualization: getting stuck

  • Open RStudio

  • type ‘?’ in R console with function, package or data name

  • Add “R” to a search with a copy of an error message

  • Help > Cheatsheets > Data Visualization with ggplot2

Data visualization: getting help

  • Many talented programmers
  • Some scan the web and answer issues

plot of chunk unnamed-chunk-1 https://stackoverflow.com/

Getting your data in R

Load data

  • Load the data
participants_data <- read.csv("participants_data.csv")
  • Keep your data in the same folder structure as .RProj
  • at or below the level of .RProj

Creating a barplot in base R

R has several systems for making graphs

  • Base R
  • Create a barplot with the plot() function
plot(participants_data$academic_parents)

plot of chunk base_barplot Bar plot of number of observations of binary data related to academic parents

Creating a boxplot in base R

  • Create a boxplot with the same plot() function
plot(participants_data$academic_parents, participants_data$days_to_email_response)

plot of chunk base_boxplot Boxplot of days to email response grouped by binary data related to academic parents

ggplot2: overview

Many libraries and functions for graphs in R…

  • ggplot2 is one of the most elegant and most versatile.

  • ggplot implements the grammar of graphics to describe and build graphs.

  • Do more and do it faster by learning one system and applying it in many places.

  • Learn more about ggplot2 in “The Layered Grammar of Graphics”

http://vita.had.co.nz/papers/layered-grammar.pdf

ggplot2: names and email

Example from your data

library(ggplot2)
ggplot(data = participants_data, aes(x=letters_in_first_name, y=days_to_email_response)) + 
  geom_point()

plot of chunk ggplot_name_email Scatterplot of days to email response as a function of the letters in your first name

Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/

ggplot2: add color and size

ggplot(data = participants_data, aes(x=letters_in_first_name, y=days_to_email_response, color=academic_parents, size=working_hours_per_day)) + 
  geom_point()

plot of chunk ggplot_color_size Scatterplot of letters in your first name as a function of days to email response with colors representing binary data related to academic parents and working hours per day as bubble sizes.

Make more graphs

ggplot2: iris data

Example from Anderson's iris data set

ggplot(data=iris, aes(x=Sepal.Length, y=Petal.Length, color=Species, size=Petal.Width))+ 
  geom_point()

plot of chunk ggplot_iris Scatterplot of iris petal length as a function of sepal length with colors representing iris species and petal width as bubble sizes.

ggplot2: diamonds price

plot of chunk unnamed-chunk-5

ggplot accepts formula arguments such as log

ggplot(data = diamonds, aes(x=carat, y=price, alpha = 0.2)) + geom_point()
ggplot(data = diamonds, aes(x=log(carat), y= log(price), alpha = 0.2)) + geom_point()

plot of chunk ggplot_carat_price

ggplot2: diamonds color shape

library(dplyr)
dsmall <- top_n(diamonds, n=100)
#Plot with different colors for color
ggplot(data = dsmall, aes(x=carat, y=price, color = color))+ geom_point()
#Plot with different shapes for cut 
ggplot( data = dsmall, aes(carat, price, shape = cut)) + geom_point()

plot of chunk ggplot_diamonds_color_shape

ggplot2: set parameters

Set parameters manually with I() Inhibit Interpretation / Conversion of Objects

ggplot(data = diamonds, aes(carat, price, alpha=I(0.1), color=I("blue"))) + geom_point()
ggplot(data = diamonds, aes(carat, price, alpha=I(0.4), color=I("green"))) + geom_point()

plot of chunk ggplot_set

ggplot2: geom options

With “geom” different types of plots can be defined e.g. points, line, boxplot, path, smooth. These can also be combined.

ggplot(data=dsmall, aes(x=carat, y=price))+
geom_point()+
geom_smooth()

plot of chunk ggplot_geom

ggplot2: smooth function

plot of chunk unnamed-chunk-7

geom_smooth() selects a smoothing method based on the data. Use method = to specify your preferred smoothing method.

ggplot(data=dsmall, aes(x=carat, y=price))+ geom_point()+ geom_smooth()

ggplot(data=diamonds, aes(x=carat, y=price))+ geom_point()+ 
geom_smooth(method = 'glm')

plot of chunk ggplot_smooth ggplot2 lines and smoothing options

ggplot2: boxplots

  • Boxplots can be displayed through geom_boxplot().
ggplot(data=diamonds, aes(x=color, y=price/carat)) + 
geom_boxplot()

plot of chunk ggplot_boxplot

ggplot2: jitter points

  • Jittered plots geom_jitter() show all points.
ggplot(data=diamonds, aes(x=color, y=price/carat)) + 
geom_boxplot()+ 
geom_jitter()

plot of chunk jitter_plot

ggplot2: adding alpha

In case of overplotting changing alpha can help.

ggplot(data=diamonds, aes(x=color, y=price/carat, alpha=I(0.1))) + 
geom_boxplot()+ 
geom_jitter()

plot of chunk boxplot_jitter

plot of chunk ggplot_box_jitter

ggplot2: geom_histogram

ggplot(data = diamonds, aes(x=carat)) +
geom_density()

ggplot(data = diamonds, aes(x=carat, color = color)) +
geom_density()

ggplot(data = diamonds, aes(x=carat, color = color, alpha=I(0.3))) +
geom_density()

plot of chunk ggplot_histograms ggplot2 histograms

ggplot2: subset

Use factor to subset your data.

ggplot(data = mpg, aes(x=displ, y=hwy,  color = cyl))+ 
geom_point()+
geom_smooth(method="lm")

ggplot(data = mpg, aes(x=displ, y=hwy,  color = factor(cyl)))+ 
geom_point()+
geom_smooth(method="lm")

plot of chunk ggplot_subset ggplot2 subset with smooth line

ggplot2: "slow ggplotting"

for aes() in ggplot()

  • using fewer functions; example - using labs() to add a title instead of ggtitle()
  • using functions multiple times; example aes(x = var1) + aes(y = var2) rather than aes(x = var1, y = var2)
  • using base R functions and tidyverse functions. For other packages, the :: style to call them
  • write out arguments (no shortcuts) aes(x = gdppercap) not aes(gdppercap)

https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1

ggplot2: not slow example

ggplot code in non-slow fashion

ggplot(mtcars, aes(mpg, y = hp, col = gear)) +
  geom_point() +
  ggtitle("My Title") +
  labs(x = "the x label", y = "the y label", col = "legend title")

plot of chunk ggplot_not_slow

ggplot2: slow ggplotting example

'Slow ggplotting' version for the same plot

  ggplot(data = mtcars) +
  aes(x = mpg) +
  labs(x = "the x label") +
  aes(y = hp) +
  labs(y = "the y label") +
  geom_point() +
  aes(col = gear) +
  labs(col = "legend title") +
  labs(title = "My Title")

plot of chunk slow_ggplotting_example

https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1

ggplot2: geom_tile

plot of chunk unnamed-chunk-8

  • Use dplyr, ggplot2 and reshape2
part_data<-select_if(participants_data, is.numeric)

cormat <- round(cor(part_data), 1)
melted_cormat <- melt(cormat)

ggplot(data = melted_cormat, aes(x=Var1, 
y=Var2, fill=value)) + 
geom_tile()

plot of chunk geom_melted_cormat

Export Figures

plot of chunk unnamed-chunk-9

png(file = "cortile.png", width = 7, height = 6, units = "in", res = 300)

ggplot(data = melted_cormat, aes(x = Var1, y = Var2, fill = value)) + geom_tile() + theme(axis.text.x = element_text(angle = 45, hjust = 1))

dev.off()
  • Check with journal about size, resolution etc.
?pdf

gganimate: datasauRus

plot of chunk unnamed-chunk-11

  • Use datasauRus, ggplot2 and gganimate
library(gganimate)
library(datasauRus)
ggplot(datasaurus_dozen, aes(x=x, y=y))+
  geom_point()+
  theme_minimal() +
  transition_states(dataset, 3, 1) + 
  ease_aes('cubic-in-out')

gganimate: Datasaurus Dozen

plot of chunk unnamed-chunk-12

plot of chunk animate_datasaurus_dozen

gganimate: diamonds carat

plot of chunk unnamed-chunk-13

  • Use tidyverse, ggplot2 and gganimate
ggplot(data = dsmall, aes(x = carat, y = price, color = color)) + 
  geom_line() +
  transition_reveal(carat) + 
  ease_aes("linear") +
  labs(title='Diamond carat: {frame_along}')

plot of chunk animate_diamonds

Tasks for the afternoon: Basic

plot of chunk unnamed-chunk-14

Test your new skills

  • Use scatter plots, barcharts and boxplots
  • Vary the sample and run the same analysis and plots
  • Save your most interesting figure and share it with us tomorrow

Tasks for the afternoon: Advanced

plot of chunk unnamed-chunk-15

Your turn to perform

  • Import data from an external source (e.g. FAO, World Bank)
  • Display those data in an interactive plot
  • Play around with the design
  • Export your most interesting figure and share it with us